sec 3
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- (4 more...)
7690dd4db7a92524c684e3191919eb6b-AuthorFeedback.pdf
If we allow arrival to be a function of, say model accuracy (Sec 3.4), then arrival indeed may diminish; in this As illustrated in Figure 1(a) in Appendix K.3, if user We believe there is value in performing long-term experiments to better understand such dynamics. We will adjust figures, add forward references, fix typos, and discuss intuition/comparisons. We will be happy to add this result. We trained binary classifiers over Adult dataset by minimizing empirical loss where features are individual info (sex, race, nationality, etc.) and labels their annual income ( These results (shown on the right) are consistent with the paper.
We have violations after CI since we do early stopping - satisfying them till end can sometimes hurt overall
Thank you for your detailed comments. We will make all clarifications below in the next version. We note that our formulation in Sec 3.2 can handle any Learning the constraints automatically is a direction for future work. There are important differences compared to Diligenti et al. We said that Diligenti and Mehta are task specific since they only experiment on a single task. Jin et al. assume arbitrary non-convex non-concave form for a twice We will add this important theorem in the paper.
Agentic Design Review System
Nag, Sayan, Joseph, K J, Goswami, Koustava, Morariu, Vlad I, Srinivasan, Balaji Vasan
Evaluating graphic designs involves assessing it from multiple facets like alignment, composition, aesthetics and color choices. Evaluating designs in a holistic way involves aggregating feedback from individual expert reviewers. Towards this, we propose an Agentic Design Review System (AgenticDRS), where multiple agents collaboratively analyze a design, orchestrated by a meta-agent. A novel in-context exemplar selection approach based on graph matching and a unique prompt expansion method plays central role towards making each agent design aware. Towards evaluating this framework, we propose DRS-BENCH benchmark. Thorough experimental evaluation against state-of-the-art baselines adapted to the problem setup, backed-up with critical ablation experiments brings out the efficacy of Agentic-DRS in evaluating graphic designs and generating actionable feedback. We hope that this work will attract attention to this pragmatic, yet under-explored research direction.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Washington > King County > Seattle (0.04)
- (11 more...)
Preliminary Explorations with GPT-4o(mni) Native Image Generation
Cao, Pu, Zhou, Feng, Ji, Junyi, Kong, Qingye, Lv, Zhixiang, Zhang, Mingjian, Zhao, Xuekun, Wu, Siqi, Lin, Yinghui, Song, Qing, Yang, Lu
Recently, the visual generation ability by GPT-4o(mni) has been unlocked by OpenAI. It demonstrates a very remarkable generation capability with excellent multimodal condition understanding and varied task instructions. In this paper, we aim to explore the capabilities of GPT-4o across various tasks. Inspired by previous study, we constructed a task taxonomy along with a carefully curated set of test samples to conduct a comprehensive qualitative test. Benefiting from GPT-4o's powerful multimodal comprehension, its image-generation process demonstrates abilities surpassing those of traditional image-generation tasks. Thus, regarding the dimensions of model capabilities, we evaluate its performance across six task categories: traditional image generation tasks, discriminative tasks, knowledge-based generation, commonsense-based generation, spatially-aware image generation, and temporally-aware image generation. These tasks not only assess the quality and conditional alignment of the model's outputs but also probe deeper into GPT-4o's understanding of real-world concepts. Our results reveal that GPT-4o performs impressively well in general-purpose synthesis tasks, showing strong capabilities in text-to-image generation, visual stylization, and low-level image processing. However, significant limitations remain in its ability to perform precise spatial reasoning, instruction-grounded generation, and consistent temporal prediction. Furthermore, when faced with knowledge-intensive or domain-specific scenarios, such as scientific illustrations or mathematical plots, the model often exhibits hallucinations, factual errors, or structural inconsistencies. These findings suggest that while GPT-4o marks a substantial advancement in unified multimodal generation, there is still a long way to go before it can be reliably applied to professional or safety-critical domains.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Arkansas (0.04)
- (9 more...)
- Health & Medicine (0.92)
- Media > Photography (0.67)
- Food & Agriculture > Agriculture (0.67)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.66)
Computation Mechanism Behind LLM Position Generalization
Most written natural languages are composed of sequences of words and sentences. Similar to humans, large language models (LLMs) exhibit flexibility in handling textual positions - a phenomenon we term position generalization. They can understand texts with position perturbations and generalize to longer texts than those encountered during training with the latest techniques. These phenomena suggest that LLMs handle positions tolerantly, but how LLMs computationally process positional relevance remains largely unexplored. This work connects the linguistic phenomenon with LLMs' computational mechanisms. We show how LLMs enforce certain computational mechanisms for the aforementioned tolerance in position perturbations. Despite the complex design of the self-attention mechanism, this work reveals that LLMs learn a counterintuitive disentanglement of attention logits. Their values show a 0.959 linear correlation with an approximation of the arithmetic sum of positional relevance and semantic importance. Furthermore, we identify a prevalent pattern in intermediate features, which we prove theoretically enables this effect. The pattern, which is different from how randomly initialized parameters would behave, suggests that it is a learned behavior rather than a natural result of the model architecture. Based on these findings, we provide computational explanations and criteria for LLMs' position flexibilities. This work takes a pioneering step in linking position generalization with modern LLMs' internal mechanisms.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
The authors propose a vision method to estimate where within an image a pictured head is looking. Given the image and the cropped out head, the system returns a saliency map consisting of confidence ratings on grid cells saying how likely that position is to be the subject of that head (person)'s gaze. The technique uses CNN with two pathways, one for the head/gaze and one for the full image/saliency of the scene. This dataset contains in total 36K people. The method is compared to a few reasonable baselines that represent alternative approaches one might implement and sanity checks.